|
Soft subspace clustering algorithm for imbalanced data
CHENG Lingfang, YANG Tianpeng, CHEN Lifei
Journal of Computer Applications
2017, 37 (10):
2952-2957.
DOI: 10.11772/j.issn.1001-9081.2017.10.2952
Aiming at the problem that the current
K-means-type soft-subspace algorithms cannot effectively cluster imbalanced data due to uniform effect, a new partition-based algorithm was proposed for soft subspace clustering on imbalanced data. First, a bi-weighting method was proposed, where each attribute was assigned a feature-weight and each cluster was assigned a cluster-weight to measure its importance for clustering. Second, in order to make a trade-off between attributes with different types or those categorical attributes having various numbers of categories, a new distance measurement was then proposed for mixed-type data. Third, an objective function was defined for the subspace clustering algorithm on imbalanced data based on the bi-weighting method, and the expressions for optimizing both the cluster-weights and feature-weights were derived. A series of experiments were conducted on some real-world data sets and the results demonstrated that the bi-weighting method used in the new algorithm can learn more accurate soft-subspace for the clusters hidden in the imbalanced data. Compared with the existing
K-means-type soft-subspace clustering algorithms, the proposed algorithm yields higher clustering accuracy on imbalanced data, achieving about 50% improvements on the bioinformatic data used in the experiments.
Reference |
Related Articles |
Metrics
|
|